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ABSTRACT 

The paper focuses on the sparse approximation of signals 
using overcomplete representations, such that it preserves the 
(prior) structure of multi -dimensional signals. The underlying 
optimization problem is tackled using a multi-dimensional 
split Bregman optimization approach. An extensive empirical 
evaluation shows how the proposed approach compares to the 
state of the art depending on the signal features. 

Index Terms — Sparse approximation, Regularization, 
Fused-LASSO, Split Bregman, Multidimensional signals. 

1. INTRODUCTION 

Dictionary-based representations proceed by approximating 
a signal via a linear combination of dictionary elements, re- 
ferred to as atoms. Sparse dictionary-based representations, 
where each signal involves few atoms, have been thoroughly 
investigated for their good properties, as they enable robust 
transmission (compressed sensing [ 1 ]) or image in-painting 
Q. The dictionary is either given, based on the domain 
knowledge, or learned from the signals . 

The so-called sparse approximation algorithm aims at 
finding a sparse approximate representation of the considered 
signals using this dictionary, by minimizing a weighted sum 
of the approximation loss and the representation sparsity (see 
|4| for a survey). When available, prior knowledge about 
the application domain can also be used to guide the search 
toward "plausible" decompositions. 

This paper focuses on sparse approximation enforcing a 
structured decomposition property, defined as follows. Let 
the signals be structured (e.g. being recorded in consecutive 
time steps); the structured decomposition property then re- 
quires that the signal structure is preserved in the dictionary- 
based representation (e.g. the atoms involved in the approx- 
imation of consecutive signals have "close" weights). The 
structured decomposition property is enforced through adding 
a total variation (TV) penalty to the minimization objective. 



In the ID case, the minimization of the above overall 
objective can be tackled using the fused-LASSO approach 
first introduced in 0. In the case of multi-dimensional (also 
called multi-channel) signal^] however, the minimization 
problem presents additional difficulties. The first contri- 
bution of the paper is to show how this problem can be 
handled efficiently, by extending the (mono-dimensional) 
split Bregman fused-LASSO solver presented in [6], to the 
multi-dimensional case. The second contribution is a com- 
prehensive experimental study, comparing state-of-the-art 
algorithms to the presented approach referred to as Multi- 
SSSA and establishing their relative performance depending 
on diverse features of the structured signals. 

This paper is organized as follows. The Section [2] intro- 
duces the formal background. The proposed optimization ap- 
proach is described in Section[5] Section[4]presents our exper- 
imental settings and reports on the results. The presented ap- 
proach is discussed w.r.t. related work in Section|5]and the pa- 
per concludes with some perspectives for further researches. 

2. PROBLEM STATEMENT 

Let Y = [y-L, . . . , y T ] e R CxT be a matrix made of T C- 
dimensional signals, and $ 6 E c ' xjv an overcomplete dic- 
tionary of N normalized atoms (N > C). We consider the 
linear model: 

y t = $x f +e t , te{l,...,T} , 

in which X = [xi, . . . , x*r] <E M ArxT stands for the decom- 
position matrix and E = [ej., . . . , e^] <G M c ' xT is a Gaussian 
noise matrix. The sparse structured decomposition problem 
consists of approximating the y t , t e {1, . . . , T}, by decom- 
posing them on the dictionary $, such that the structure of the 
decompositions x t reflects that of the signals y t . This goal is 
formalized as the minimization of the objective function]^] 

mm\\Y -^Xg + X.WXW. + X^XPW, , (1) 



Our motivating application considers electro-encephalogram (EEG) sig- 
nals, where the number of sensors ranges up to a few tens. 

2 ||j4|Ip = 5Zj \ p ) * ■ The case p = 2 corresponds to the clas- 
sical Frobenius norm. 



where Ai and A2 are regularization coefficients and P encodes 
the signal structure (provided by the prior knowledge) as in 
Q. In the remainder of the paper, the considered structure is 
that of the temporal ordering of the signals, i.e. ||XP||i = 

Y<t=2 \\ x t - -X"*— i||i. 



Eq. <|3j and Eq. (|6]l could be resolved with the soft-thresholding 
operator: 

A i+1 = Soft/Threshold^,, ,, {X i+1 + D\) (7) 
B i+1 = SoftThresholdxan (X l+1 P + D B ) . (8) 



3. OPTIMIZATION STRATEGY 
3.1. Algorithm description 

Bregman iterations have shown to be very efficient for l\ reg- 
ularized problems J8) . For convex problems with linear con- 
straints, the split Bregman iteration technique is equivalent 
to the method of multipliers and the augmented Lagrangian 
one 13. The iteration scheme presented in ||6l considers an 
augmented Lagrangian formalism. We have chosen here to 
present ours with the initial split Bregman formulation. 

First, let us restate the sparse approximation problem: 



mm x ,A,B \\Y - $X\\ 2 2 + Ai|L4||i + AsH-B^ 



s.t. 



A = X 
B = XP 



(2) 



This reformulation is a key step of the split Bregman method. 
It decouples the three terms and allows to optimize them sep- 
arately within the Bregman iterations. To set-up this iteration 
scheme, Eq. |2]) must be transform to an unconstrained prob- 
lem: 

mmx.A.B \\Y - $X\\Z + A1PH1 + A 2 ||S||i 

+ if\\X-A\\l + f\\XP-B\\i ■ 

The split Bregman scheme could then be expressed as |8 1: 



i+l 



.4 



i+l 



B l+L ) 



argmin X A fl \\Y - $X\\% 
+A 1 |L4|| 1 +A 2 ||B|| 1 (3) 



+ !f\\X-A^ 
-if\\XP-B 



D\\\l 



= D\ + {X l+1 - A l+1 ) 
(X' l+1 P - B l+1 ) 



n i+1 — r> i 



Thanks to the split of the three terms, the minimization of 
Eq. Q could be performed iteratively by alternatively updat- 
ing variables in the system: 



X t+1 = argmin x \\Y - 



-f\\XP-B 



tf\\X-A l 
+ D B \\ 2 2 



d'aWI 



A 



i+l 



= argmin A A 1 ||A|| 1 + ^||X 



i+i 



A + D\\\l 



(4) 
(5) 



B t+1 ^argmiiis A 2 ||B||i + f \\X* +1 P -B + D B g (6) 

Only few iterations of this system are necessary for conver- 
gence. In our implementation, this update is only performed 
once at each iteration of the global optimization algorithm. 



Solving Eq. (|4]) requires the minimization of a convex differ- 
entiable function which can be performed via classical op- 
timization methods. We propose here to solve it determin- 
istically. The main difficulty in extending [6] to the multi- 
dimensional signals case rely on this step. Let us define H 
from Eq. Q such as: 

X i+1 = argmin x H{X) . 

Differentiating this expression with respect to X yields: 



dX 



H = (2$ T $ + fnI)X + X([i 2 PP T ) - 2$F (9) 
+ f i 1 (D A -A>)+V2(D> B -B*)P T , 



where / is the identity matrix. The minimum X = X l+1 
of Eq. Q is obtained by solving -^H(X) = which is a 
Sylvester equation: 

WX + XZ = AP , (10) 

with W = 2$ T $ + ml, Z = fi 2 PP T and M = -D\ + 
2$F + mA l + (fi 2 B l - D l B )P T . Fortunately, in our case, 
W and Z are real symmetric matrices. Thus, they can be 
diagonalized as follows: 

W = FD W F T 
Z = GD Z G T 



where F and G are orthogonal matrices. Eq. ( 10 1 becomes: 

D W X' + X'D z = M i ' , (11) 

with X' = F T XG and M u = F T M i G. 
X' is then obtained by: 

VtG {1,...,T} X'(:,t) = (D w + D z (t,t)I)- 1 M i '(:,t) 

where the notation (:, t) indices the column t of matrices. Go- 
ing back to X could be performed with: X = FX'G T . 
W and Z being independent of the iteration (i) considered, 
their diagonalizations are done only once and for all as well 
as the computation of the terms (D w + D z (t, t)J) _1 , Vf G 
{1, . . . , T}. Thus, this update does not require heavy compu- 
tations. The full algorithm is summarized below. 

3.2. Multi-SSSA sum up 

Inputs: Y, P. Parameters: Ai, A2, fi\, j-12, e, iterMax, kMax 
1: Init D% D° B , X° and set B° = X°P, A = X°, 
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W = 2$ J $ + Mi-f and Z = ^PP 1 . 
Compute D w , D z , F and G from W and Z. 
Precompute (t -» T), D\ smp = (A, + D„(t, t)I)~ 

while i < iter Max and l|x 1,2 > e do 

k = 

for fc — > kMax do 

M' = F T (2$ T Y" - ni(D' A - A temp ) - 
B temp )P T )G 

for i -> T do 

X te ™ p (:,t) = D|em P M'(:,t) 
end for 

A temp = So f tThreshold;( 



B iemp = Soft Threshold^ 
end for 

j^-j + l j^-temp. y^ + l j^temp . 

D 1 / 1 =D\ + (X 1+1 - A l+1 ) ' 
D 1 ^ 1 =D l B + {X l+1 P - B l+1 ] 
i = i + l 
end while 



L 

^y-temp p 



1-D'b) 



B' 



B 



t e mp 



4. EXPERIMENTAL EVALUATION 

The following experiment aims at assessing the efficiency of 
our approach in decomposing signals built with particular reg- 
ularities. We compare it both to algorithms coding each signal 
separately, the orthogonal matching pursuit (OMP) ifTUl and 
the LARS flT| (a LASSO solver), and to methods perform- 
ing the decomposition simultaneously, the simultaneous OMP 
(SOMP) and an proximal method solving the group-LASSO 
problem (FISTA ifTH ). 

4.1. Data generation 

From a fixed random overcomplete dictionary $, a set of K 
signals having piecewise constant structures have been cre- 
ated. Each signal Y is synthesized from the dictionary $ and 
a built decomposition matrix X with Y = &X 
The TV penalization of the fused-LASSO regularization 
makes him more suitable to deal with data having abrupt 
changes. Thus, the decomposition matrices of signals have 
been built as linear combinations of specific activities which 
have been generated as follows: 



o 



if i ^ ind 
if i = k 



where P E M. NxT , H is the Heaviside function, ind € 
{1, . . . , N} is the index of an atom, m is the center of the 
activity and d its duration. Each decomposition matrix X 
could then be written: 



X 



£ a>P 



where n a is the number of activities appearing in one signal 
and the a,; stand for the activation weights. An example of 
such signal is given in the Figure [Tjbelow. 




Fig. 1. Built signal, with C = 4 channels and N = 8 atoms. 



4.2. Experimental setting 

Each method has been applied to the previously created sig- 
nals. Then the distances between the estimated decomposi- 
tion matrices X and the real ones X have been calculated as 
follows: 



dist(X,X) 



\X-X\ 

\\x\u 



The goal was to understand the influence of the number 
of activities (n a ) and the range of durations (d) on the effi- 
ciency of the fused-LASSO regularization compared to oth- 
ers sparse coding algorithms. The scheme of experiment de- 
scribed above has been carried out with the following grid of 
parameters: 

• n a € {20, 30,..., 110}, 

)€ {(0.1,0.15), (0.2, 0.25),..., (1,1)} 

For each point in the above parameter grid, two sets of sig- 
nals has been created: a train set allowing to determine for 
each method the best regularization coefficients and a test set 
designed for evaluate them with these coefficients. 
Other parameters have been chosen as follows: 



Model 


Activities 


C = 20 


m~ U{0,T) 


T = 300 


a~7V(0,2) 


N = 40 


ind~U(l,N) 


K = 100 





indi ,ynt 



i=l 



Dictionaries have been randomly generated using Gaussian 
independent distributions on individual elements and present 
low coherence. 

4.3. Results and discussion 

In order to evaluate the proposed algorithm, for each point 
(i, j) in the above grid of parameters, the mean (among test 



Fig. 2. Left: Mean distances dist obtained with the Multi-SSSA. Middle: Difference between the mean distances obtained 
with the Multi-SSSA and those obtained with the LARS. Right: Difference between the mean distances obtained with the 
Multi-SSSA and those obtained with the Group-LASSO solver. The white diamonds correspond to non-significant differences 
between the means distances. 



signals) of the previously defined distance dist has been com- 
puted for each method and compared to the mean obtained by 
the Multi-SSSA. A paired t-test (p < 0.05) has then been per- 
formed to check the significance of these differences. 
Results are displayed in Figure [2] In the ordinate axis, the 
number of patterns increases from the top to the bottom and 
in the abscissa axis, the duration grows from left to right. The 
left image displays the mean distances obtained by the Multi- 
SSSA. Unsurprisingly, the difficulty of finding the ideal de- 
composition increases with the number of patterns and their 
durations. The middle and right figures present its perfor- 
mances compared to other methods by displaying the differ- 
ences (point to point) of mean distances in grayscale. These 
differences are calculated such that, negative values (darker 
blocks) means that our method outperform the other one. The 
white diamonds correspond to non-significant differences of 
mean distances. Results of the OMP and the LARS are very 
similar as well as those of the SOMP and the group-LASSO 
solver. We only display here the matrices comparing our 
method to the LARS and the group-LASSO solver. 

Compared to the OMP and the LARS, our method obtains 
same results as them when only few atoms are active at the 
same time. It happens in our artificial signals when only few 
patterns have been used to create decomposition matrices 
and/or when the pattern durations are small. On the contrary, 
when many atoms are active simultaneously, the OMP and 
LARS are outperformed by the above algorithm which use 
inter-signal prior information to find better decompositions. 
Compared to the SOMP and the group-LASSO solver, results 
depend more on the duration of patterns. When patterns are 
long and not too numerous, theirs performances is similar 
to the fused-LASSO one. The SOMP is outperformed in all 
other cases. On the contrary, the group-LASSO solver is out- 
performed only when patterns have short/medium durations. 



5. RELATION TO PRIOR WORKS 

The simultaneous sparse approximation of multi-dimensional 
signals has been widely studied during these last years ifFJl 



and numerous methods developed IT4l [T5l [T6l \T7\ 0] . More 
recently, the concept of structured sparsity has considered the 
encoding of priors in complex regularizations iTTSl [T9l . Our 
problem belongs to this last category with a regularization 
combining a classical sparsity term and a Total Variation one. 
This second term has been studied intensively for image de- 
noising as in the ROF model ||201|2T1 . 

The combination of these terms has been introduced as the 
fused-LASSO 0. Despite its convexity, the two t\ non- 
differentiable terms make it difficult to solve. The initial pa- 
per 1 5 1 transforms it to a quadratic problem and uses standard 
optimization tools (SQOPT). Increasing the number of vari- 
ables, this approach can not deal with large-scale problems. 
A path algorithm has been developed but is limited to the par- 
ticular case of the fused-LASSO signal approximator l22l . 
More recently, scalable approaches based on proximal sub- 
gradient methods [23 1, ADMM |24| and split Bregman itera- 
tions |6] have been proposed for the general fused-LASSO. 
To the best of our knowledge, the multi-dimensional fused- 
LASSO in the context of overcomplete representations has 
never been studied. The closest work we found considers 
a problem of multi-task regression [7]. The final paper had 
been published under a different title [25 1 and proposes a 
new method based on the approximation of the fused-LASSO 
TV penalty by a smooth convex function as described in [26 1 . 



6. CONCLUSION AND PERSPECTIVES 

This paper has shown the efficiency of the proposed Multi- 
SSSA based on a split Bregman approach, in order to achieve 
the sparse structured approximation of multi-dimensional sig- 
nals, under general conditions. Specifically, the extensive val- 
idation has considered different regimes in terms of the sig- 
nal complexity and dynamicity (number of patterns simulta- 
neously involved and average duration thereof), and it has es- 
tablished a relative competence map of the proposed Multi- 
SSSA approach comparatively to the state of the art. Further 
work will apply the approach to the motivating application 
domain, namely the representation of EEG signals. 
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